This project employs Python to predict medical expenses using linear regression. By analyzing factors such as age, BMI, and various health conditions; a model was developed that accurately estimates healthcare costs. The results provide valuable insights for cost management and insurance planning.
Data types for each column were analyzed to confirm data is in the correct format and to aid in selecting appropriate techniques for feature engineering.
The following provides a concise summary of the dataframe. A 'Non-Null Count' was returned for the data columns, indicating there are no missing values in the dataset. The summary also identifies 12 columns with data types of both Integers and Objects (i.e., categories). One can conclude the dataframe has data integrity.
medical_df.info()
<class 'pandas.core.frame.DataFrame'> RangeIndex: 986 entries, 0 to 985 Data columns (total 12 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 Age 986 non-null int64 1 Diabetes 986 non-null object 2 BloodPressureProblems 986 non-null int64 3 AnyTransplants 986 non-null object 4 AnyChronicDiseases 986 non-null object 5 Height 986 non-null int64 6 Weight 986 non-null int64 7 KnownAllergies 986 non-null int64 8 HistoryOfCancerInFamily 986 non-null object 9 NumberOfMajorSurgeries 986 non-null object 10 PremiumPrice 986 non-null int64 11 BloodPressureProblem 986 non-null object dtypes: int64(6), object(6) memory usage: 92.6+ KB
Statistics for each column were analyzed to verify the range of data was consistent for each feature.
Analysis:
The overview shows patients have an age range of 18 to 66. Statistics for Age align with what is to be expected. This could also indicate the insurance company does not accept applications from people under the age of 18 and above the age of 66. A similar analysis was performed for the remaining features and all ranges were deemed appropriate.
medical_df.describe()
| Age | Diabetes | BloodPressureProblems | AnyTransplants | AnyChronicDiseases | Height | Weight | KnownAllergies | HistoryOfCancerInFamily | NumberOfMajorSurgeries | PremiumPrice | |
|---|---|---|---|---|---|---|---|---|---|---|---|
| count | 986.000000 | 986.000000 | 986.000000 | 986.000000 | 986.000000 | 986.000000 | 986.000000 | 986.000000 | 986.000000 | 986.000000 | 986.000000 |
| mean | 41.745436 | 0.419878 | 0.468560 | 0.055781 | 0.180527 | 168.182556 | 76.950304 | 0.215010 | 0.117647 | 0.667343 | 24336.713996 |
| std | 13.963371 | 0.493789 | 0.499264 | 0.229615 | 0.384821 | 10.098155 | 14.265096 | 0.411038 | 0.322353 | 0.749205 | 6248.184382 |
| min | 18.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 145.000000 | 51.000000 | 0.000000 | 0.000000 | 0.000000 | 15000.000000 |
| 25% | 30.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 161.000000 | 67.000000 | 0.000000 | 0.000000 | 0.000000 | 21000.000000 |
| 50% | 42.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 168.000000 | 75.000000 | 0.000000 | 0.000000 | 1.000000 | 23000.000000 |
| 75% | 53.000000 | 1.000000 | 1.000000 | 0.000000 | 0.000000 | 176.000000 | 87.000000 | 0.000000 | 0.000000 | 1.000000 | 28000.000000 |
| max | 66.000000 | 1.000000 | 1.000000 | 1.000000 | 1.000000 | 188.000000 | 132.000000 | 1.000000 | 1.000000 | 3.000000 | 40000.000000 |
Exploratory analysis was performed on the data by visualizing the distribution of various features and their relationships with PremiumPrice.
Analysis:
The distributions for each of the features were reflective of the population (i.e., all ranges for each feature were appropriate for the given classification).
medical_df.Age.describe()
# Visualizing the Age Category
fig = px.histogram(medical_df,
x='Age',
marginal='box',
nbins=48,
title='Distribution of Age')
fig.update_layout(bargap=0.1)
fig.show()
fig = px.histogram(medical_df,
x='Height',
marginal='box',
color_discrete_sequence=['red'],
title='Distribution of Height')
fig.update_layout(bargap=0.1)
fig.show()
fig = px.histogram(medical_df,
x='Weight',
marginal='box',
color_discrete_sequence=['purple'],
title='Distribution of Weight')
fig.update_layout(bargap=0.1)
fig.show()
fig = px.histogram(medical_df,
x='PremiumPrice',
marginal='box',
color='Diabetes',
color_discrete_sequence=['green', 'grey'],
title='Premium Prices')
fig.update_layout(bargap=0.1)
fig.show()
medical_df['BloodPressureProblem'] = medical_df['BloodPressureProblems'].astype('object')
fig = px.histogram(medical_df,
x='PremiumPrice',
marginal='box',
color='BloodPressureProblems',
color_discrete_sequence=['blue', 'grey'],
title='Premium Prices')
fig.update_layout(bargap=0.1)
fig.show()
fig = px.histogram(medical_df,
x='PremiumPrice',
marginal='box',
color='AnyTransplants',
color_discrete_sequence=['purple', 'grey'],
title='Premium Prices')
fig.update_layout(bargap=0.1)
fig.show()
fig = px.histogram(medical_df,
x='PremiumPrice',
marginal='box',
color='AnyChronicDiseases',
color_discrete_sequence=['orange', 'grey'],
title='Premium Prices')
fig.update_layout(bargap=0.1)
fig.show()
fig = px.histogram(medical_df,
x='PremiumPrice',
marginal='box',
color='KnownAllergies',
color_discrete_sequence=['red', 'grey'],
title='KnownAllergies')
fig.update_layout(bargap=0.1)
fig.show()
fig = px.histogram(medical_df,
x='PremiumPrice',
marginal='box',
color='HistoryOfCancerInFamily',
color_discrete_sequence=['yellow', 'grey'],
title='Premium Prices')
fig.update_layout(bargap=0.1)
fig.show()
fig = px.histogram(medical_df,
x='PremiumPrice',
marginal='box',
color='NumberOfMajorSurgeries',
color_discrete_sequence=['pink', 'green', 'blue', 'grey' ],
title='Premium Prices')
fig.update_layout(bargap=0.1)
fig.show()
The relationship between Age and PremiumPrice was visualized using a scatterplot. Features in other columns were used to color code the plots.
Analysis:
The general trends for each of the visualizations seem to be that there is an increase in PremiumPrice with Age. There also seems to be numerous variations at every age (i.e., outliers). Even so, the relationship between Age and PremiumPrice displays a moderate correlation. Other values used to color code the plots do not seem to have any correlation with PremiumPrice.
fig = px.scatter(medical_df,
x='Age',
y='PremiumPrice',
color='AnyTransplants',
opacity=0.8,
title='Age vs. PremiumPrice')
fig.update_traces(marker_size=5)
fig.show()
medical_df['AnyChronicDiseases'] = medical_df['AnyChronicDiseases'].astype('object')
fig = px.scatter(medical_df,
x='Age',
y='PremiumPrice',
color='AnyChronicDiseases',
opacity=0.8,
title='Age vs. PremiumPrice')
fig.update_traces(marker_size=5)
fig.show()
medical_df['HistoryOfCancerInFamily'] = medical_df['HistoryOfCancerInFamily'].astype('object')
fig = px.scatter(medical_df,
x='Age',
y='PremiumPrice',
color='HistoryOfCancerInFamily',
opacity=0.8,
title='Age vs. PremiumPrice')
fig.update_traces(marker_size=5)
fig.show()
medical_df['NumberOfMajorSurgeries'] = medical_df['NumberOfMajorSurgeries'].astype('object')
fig = px.scatter(medical_df,
x='Age',
y='PremiumPrice',
color='NumberOfMajorSurgeries',
opacity=0.8,
title='Age vs. PremiumPrice')
fig.update_traces(marker_size=5)
fig.show()
The relationship between Weight and PremiumPrice was visualized using a scatterplot. Features in other columns were used to color code the plots.
Analysis:
There does not seem to be any trends and/or relationship between Weight and PremiumPrice when viewing each of the visualizations. There also seems to be numerous variations at every age (i.e., outliers). Other features used to color code the plots do not seem to have any correlation with PremiumPrice.
fig = px.scatter(medical_df,
x='Weight',
y='PremiumPrice',
color='AnyTransplants',
opacity=0.8,
title='Age vs. PremiumPrice')
fig.update_traces(marker_size=5)
fig.show()
medical_df['AnyChronicDiseases'] = medical_df['AnyChronicDiseases'].astype('object')
fig = px.scatter(medical_df,
x='Weight',
y='PremiumPrice',
color='AnyChronicDiseases',
opacity=0.8,
title='Age vs. PremiumPrice')
fig.update_traces(marker_size=5)
fig.show()
medical_df['HistoryOfCancerInFamily'] = medical_df['HistoryOfCancerInFamily'].astype('object')
fig = px.scatter(medical_df,
x='Weight',
y='PremiumPrice',
color='HistoryOfCancerInFamily',
opacity=0.8,
title='Age vs. PremiumPrice')
fig.update_traces(marker_size=5)
fig.show()
medical_df['NumberOfMajorSurgeries'] = medical_df['NumberOfMajorSurgeries'].astype('object')
fig = px.scatter(medical_df,
x='Weight',
y='PremiumPrice',
color='NumberOfMajorSurgeries',
opacity=0.8,
title='Age vs. PremiumPrice')
fig.update_traces(marker_size=5)
fig.show()
As displayed in the scatter plots, Age is more closely related to PremiumPrice compared to other features (i.e., Weight). This relationship is often expressed numerically using a measure known as the correlation coefficient. Categorical columns were converted to numeric columns to compute the correlation.
Analysis:
The correlation coefficients confirmed our assumption that Age was the only feature that had a moderate to significant relationship with PremiumPrice (i.e., any correlation of -0.5 to -1, or 0.5 to 1). All other features had a weak or no relationship.
medical_df.PremiumPrice.corr(medical_df.Age)
0.6975399655058031
medical_df.PremiumPrice.corr(medical_df.Weight)
0.14150740525639752
medical_df['AnyTransplants'] = medical_df['AnyTransplants'].astype('int')
medical_df.PremiumPrice.corr(medical_df.AnyTransplants)
0.2890559369634021
medical_df['AnyChronicDiseases'] = medical_df['AnyChronicDiseases'].astype('int')
medical_df.PremiumPrice.corr(medical_df.AnyTransplants)
0.2890559369634021
medical_df['HistoryOfCancerInFamily'] = medical_df['HistoryOfCancerInFamily'].astype('int')
medical_df.PremiumPrice.corr(medical_df.HistoryOfCancerInFamily)
0.08313941651638145
medical_df['NumberOfMajorSurgeries'] = medical_df['NumberOfMajorSurgeries'].astype('int')
medical_df.PremiumPrice.corr(medical_df.NumberOfMajorSurgeries)
0.26424952935741713
medical_df.corr()
| Age | BloodPressureProblems | AnyTransplants | AnyChronicDiseases | Height | Weight | KnownAllergies | HistoryOfCancerInFamily | NumberOfMajorSurgeries | PremiumPrice | |
|---|---|---|---|---|---|---|---|---|---|---|
| Age | 1.000000 | 0.244888 | -0.008549 | 0.051072 | 0.039879 | -0.018590 | -0.024416 | -0.027623 | 0.429181 | 0.697540 |
| BloodPressureProblems | 0.244888 | 1.000000 | -0.024538 | 0.045424 | -0.037926 | -0.061016 | -0.011550 | 0.048239 | 0.251568 | 0.167097 |
| AnyTransplants | -0.008549 | -0.024538 | 1.000000 | 0.035285 | -0.031543 | 0.002087 | 0.001876 | -0.020171 | -0.004154 | 0.289056 |
| AnyChronicDiseases | 0.051072 | 0.045424 | 0.035285 | 1.000000 | 0.047419 | -0.033318 | -0.027418 | 0.008666 | 0.014835 | 0.208610 |
| Height | 0.039879 | -0.037926 | -0.031543 | 0.047419 | 1.000000 | 0.066946 | -0.010200 | 0.010549 | 0.037289 | 0.026910 |
| Weight | -0.018590 | -0.061016 | 0.002087 | -0.033318 | 0.066946 | 1.000000 | 0.037492 | 0.003481 | -0.006108 | 0.141507 |
| KnownAllergies | -0.024416 | -0.011550 | 0.001876 | -0.027418 | -0.010200 | 0.037492 | 1.000000 | 0.115383 | 0.103923 | 0.012103 |
| HistoryOfCancerInFamily | -0.027623 | 0.048239 | -0.020171 | 0.008666 | 0.010549 | 0.003481 | 0.115383 | 1.000000 | 0.212657 | 0.083139 |
| NumberOfMajorSurgeries | 0.429181 | 0.251568 | -0.004154 | 0.014835 | 0.037289 | -0.006108 | 0.103923 | 0.212657 | 1.000000 | 0.264250 |
| PremiumPrice | 0.697540 | 0.167097 | 0.289056 | 0.208610 | 0.026910 | 0.141507 | 0.012103 | 0.083139 | 0.264250 | 1.000000 |
sns.heatmap(medical_df.corr(), cmap='Reds', annot=True)
plt.title('Correlation Matrix');
Based on our analysis, we know that Age and PremiumPrice have the strongest relationship. To further our analysis, a linear regression model was created to predict PremiumPrice ("target") using the following formula:
PremiumPrice = w * Age + b
Analysis:
A helper function was created to compute PremiumPrice, given Age, w and b. To get a clearer approximation of the slope and intercept (w and b), sample values were inserted into the formula and compared to the Age column in the dataset. Additionally, parameters were also visualized using scatter plots, which compared estimated charges to the actual dataset. Our analysis shows parameters of w and b should be close in range to the values 300 and 11,000.
plt.title('Age vs. PremiumPrice')
sns.scatterplot(data=medical_df, x='Age', y='PremiumPrice', alpha=0.7, s=15);
def estimate_premiums(Age, w, b):
return w * Age + b
w = 50
b = 100
estimate_premiums(30, w, b)
1600
medical_df.PremiumPrice
0 25000
1 29000
2 23000
3 28000
4 23000
...
981 15000
982 28000
983 29000
984 39000
985 15000
Name: PremiumPrice, Length: 986, dtype: int64
try_parameters(150, 15000)
try_parameters(300, 11000)
The model’s predictions can also be computed using the loss/cost function, which measures how well a model's predictions match the actual values in the dataset. The result is known as the root mean squared error ("RMSE"). The RMSE measures the average distance between the statistical model's predicted values and the actual values in the dataset.
Analysis:
Previous sample set weights of 300 and 11,000 were used to compute the RSME. The RSME result was 4551.25. Given the range of PremiumPrice being 25,000 (45,000 - 15,000), a result of 4551.25 is an acceptable RSME. This conclusion can be made because the dataset includes various outliers. However, given the moderate correlation of Age and PremiumPrice, a RSME percentage closer to 10 - 15 percent of the data range would be more suitable. The result currently stands at 18 percent (4551.25/25,000).
def rmse(targets, predictions):
return np.sqrt(np.mean(np.square(targets - predictions)))
w = 300
b = 11000
targets = medical_df.PremiumPrice
predictions = estimate_premiums(medical_df.Age, w, b)
rmse(targets, predictions)
4551.257544159166
Scikit-learn was used to automate the machine learning process and assist with predicting PremiumPrice. In practice, the previous methods do not need to be implemented, but it is a good way to better understand the data and the analysis gathered can be compared to the final results to verify accuracy.
Analysis:
A new model object was created. Additionally, Scikit-learn requires "X" to be a 2-d array, so the dataframe was passed instead of a column. Next, the model was fitted to the data and PremiumPrice was predicted using the model. The RMSE for the model was also calculated, which had a result of 4474.83. This result is similar to the previous analysis, which had an RMSE of 4551.25. The parameters of the model are stored in the "coef" and "intercept" properties. A scatter plot was created to compare the model to the values in the actual dataset.
from sklearn.linear_model import LinearRegression
model = LinearRegression()
inputs = medical_df[['Age']]
targets = medical_df.PremiumPrice
print('inputs.shape :', inputs.shape)
print('targes.shape :', targets.shape)
inputs.shape : (986, 1) targes.shape : (986,)
model.fit(inputs, targets)
LinearRegression()
predictions = model.predict(inputs)
predictions
array([25352.5543132 , 30034.47338215, 22543.40287183, 27537.44987871,
23167.65874769, 20670.63524426, 21607.01905805, 18485.73967875,
26288.93812699, 23167.65874769, 30034.47338215, 31907.24100972,
18797.86761668, 25664.68225113, 16925.0999891 , 23167.65874769,
24416.17049941, 23167.65874769, 29098.08956836, 17861.48380289,
26601.06606492, 17549.35586496, 22231.2749339 , 22231.2749339 ,
27849.57781664, 20982.76318219, 18173.61174082, 30034.47338215,
20670.63524426, 21607.01905805, 18173.61174082, 19422.12349254,
20046.3793684 , 19422.12349254, 31282.98513386, 26913.19400285,
25040.42637527, 29410.21750629, 24728.29843734, 18797.86761668,
17549.35586496, 31907.24100972, 19109.99555461, 27537.44987871,
19422.12349254, 25040.42637527, 19109.99555461, 28161.70575457,
31282.98513386, 17549.35586496, 20670.63524426, 30970.85719593,
30346.60132008, 24728.29843734, 29410.21750629, 24104.04256148,
30034.47338215, 26601.06606492, 18485.73967875, 25664.68225113,
22543.40287183, 28473.8336925 , 18485.73967875, 20982.76318219,
31282.98513386, 19422.12349254, 26601.06606492, 17237.22792703,
23791.91462355, 21294.89112012, 19734.25143047, 28473.8336925 ,
30970.85719593, 19109.99555461, 27849.57781664, 31907.24100972,
18485.73967875, 22543.40287183, 26601.06606492, 20670.63524426,
25040.42637527, 30970.85719593, 19734.25143047, 25664.68225113,
25664.68225113, 20982.76318219, 25664.68225113, 17549.35586496,
22231.2749339 , 21607.01905805, 25664.68225113, 30970.85719593,
27537.44987871, 28161.70575457, 27849.57781664, 17861.48380289,
21919.14699598, 20982.76318219, 27537.44987871, 24416.17049941,
18797.86761668, 25352.5543132 , 29722.34544422, 17861.48380289,
18485.73967875, 21607.01905805, 30034.47338215, 20982.76318219,
20046.3793684 , 31282.98513386, 28473.8336925 , 22231.2749339 ,
20358.50730633, 22231.2749339 , 25976.81018906, 19109.99555461,
28473.8336925 , 29722.34544422, 31907.24100972, 21294.89112012,
28161.70575457, 30970.85719593, 23167.65874769, 18485.73967875,
21607.01905805, 20982.76318219, 24728.29843734, 23479.78668562,
29722.34544422, 17237.22792703, 20046.3793684 , 29722.34544422,
23167.65874769, 18485.73967875, 21919.14699598, 19734.25143047,
22855.53080976, 31907.24100972, 27225.32194078, 26288.93812699,
23791.91462355, 28161.70575457, 21607.01905805, 21607.01905805,
29722.34544422, 27537.44987871, 22231.2749339 , 29722.34544422,
27849.57781664, 26913.19400285, 28785.96163043, 25976.81018906,
29098.08956836, 24416.17049941, 17861.48380289, 30658.729258 ,
30658.729258 , 17549.35586496, 24728.29843734, 25664.68225113,
29410.21750629, 17237.22792703, 21919.14699598, 27849.57781664,
20358.50730633, 21294.89112012, 16925.0999891 , 20046.3793684 ,
25040.42637527, 24416.17049941, 26288.93812699, 20670.63524426,
20982.76318219, 25976.81018906, 18797.86761668, 28473.8336925 ,
24416.17049941, 22231.2749339 , 21607.01905805, 31907.24100972,
23167.65874769, 22855.53080976, 20670.63524426, 22543.40287183,
20670.63524426, 25664.68225113, 17237.22792703, 19109.99555461,
24416.17049941, 22231.2749339 , 20670.63524426, 26288.93812699,
26913.19400285, 21607.01905805, 24728.29843734, 17237.22792703,
23479.78668562, 25352.5543132 , 26288.93812699, 17549.35586496,
23791.91462355, 21294.89112012, 20670.63524426, 18797.86761668,
19734.25143047, 30034.47338215, 20982.76318219, 16925.0999891 ,
21294.89112012, 26601.06606492, 24104.04256148, 20358.50730633,
30034.47338215, 25664.68225113, 27537.44987871, 18173.61174082,
31907.24100972, 31595.11307179, 30658.729258 , 22231.2749339 ,
26913.19400285, 30346.60132008, 26601.06606492, 19109.99555461,
22855.53080976, 29098.08956836, 30034.47338215, 25040.42637527,
18173.61174082, 28785.96163043, 30970.85719593, 22543.40287183,
27849.57781664, 20358.50730633, 21294.89112012, 16925.0999891 ,
22231.2749339 , 21919.14699598, 30658.729258 , 28161.70575457,
26601.06606492, 17861.48380289, 17549.35586496, 26601.06606492,
25976.81018906, 28473.8336925 , 20358.50730633, 27537.44987871,
25976.81018906, 17237.22792703, 19109.99555461, 31595.11307179,
21294.89112012, 27849.57781664, 30034.47338215, 22855.53080976,
17549.35586496, 18173.61174082, 29098.08956836, 21607.01905805,
26288.93812699, 19109.99555461, 28161.70575457, 22543.40287183,
18797.86761668, 18797.86761668, 28473.8336925 , 20670.63524426,
23791.91462355, 26601.06606492, 25664.68225113, 30658.729258 ,
25040.42637527, 22543.40287183, 31282.98513386, 26288.93812699,
17549.35586496, 20046.3793684 , 25664.68225113, 27537.44987871,
16925.0999891 , 22231.2749339 , 31282.98513386, 31907.24100972,
26913.19400285, 26288.93812699, 31907.24100972, 27537.44987871,
26288.93812699, 31282.98513386, 26601.06606492, 20358.50730633,
24728.29843734, 25352.5543132 , 23791.91462355, 31282.98513386,
23791.91462355, 21294.89112012, 27537.44987871, 22231.2749339 ,
19422.12349254, 19109.99555461, 19109.99555461, 23791.91462355,
19422.12349254, 24728.29843734, 28785.96163043, 23167.65874769,
31595.11307179, 31282.98513386, 18485.73967875, 19109.99555461,
22543.40287183, 23791.91462355, 29722.34544422, 21294.89112012,
27537.44987871, 27849.57781664, 28473.8336925 , 24416.17049941,
20046.3793684 , 21607.01905805, 21919.14699598, 18797.86761668,
30346.60132008, 20358.50730633, 26288.93812699, 23167.65874769,
27225.32194078, 30346.60132008, 20670.63524426, 29098.08956836,
30970.85719593, 24728.29843734, 20982.76318219, 28473.8336925 ,
17237.22792703, 26913.19400285, 25352.5543132 , 25976.81018906,
25976.81018906, 24728.29843734, 26288.93812699, 19109.99555461,
21919.14699598, 25976.81018906, 18173.61174082, 21919.14699598,
30970.85719593, 27225.32194078, 26288.93812699, 21607.01905805,
24416.17049941, 17237.22792703, 24416.17049941, 31595.11307179,
25976.81018906, 24104.04256148, 25352.5543132 , 26601.06606492,
24416.17049941, 21919.14699598, 22855.53080976, 24728.29843734,
25664.68225113, 28161.70575457, 26288.93812699, 27537.44987871,
22231.2749339 , 22543.40287183, 30658.729258 , 31282.98513386,
19109.99555461, 30658.729258 , 20358.50730633, 22855.53080976,
24728.29843734, 30346.60132008, 28161.70575457, 24104.04256148,
30658.729258 , 24728.29843734, 17237.22792703, 23167.65874769,
25352.5543132 , 28161.70575457, 20358.50730633, 30658.729258 ,
28161.70575457, 25352.5543132 , 31282.98513386, 23791.91462355,
22231.2749339 , 29410.21750629, 29098.08956836, 20046.3793684 ,
21607.01905805, 21294.89112012, 31907.24100972, 24728.29843734,
21294.89112012, 31595.11307179, 21919.14699598, 28473.8336925 ,
31282.98513386, 25040.42637527, 22231.2749339 , 23167.65874769,
25352.5543132 , 23479.78668562, 29722.34544422, 22543.40287183,
30346.60132008, 21607.01905805, 25040.42637527, 20982.76318219,
16925.0999891 , 25352.5543132 , 21919.14699598, 22855.53080976,
20358.50730633, 22543.40287183, 18173.61174082, 17549.35586496,
30346.60132008, 26601.06606492, 19734.25143047, 29722.34544422,
18485.73967875, 17861.48380289, 29410.21750629, 22855.53080976,
18797.86761668, 25040.42637527, 26913.19400285, 22855.53080976,
20358.50730633, 22855.53080976, 18797.86761668, 31595.11307179,
26288.93812699, 31282.98513386, 18173.61174082, 23479.78668562,
17237.22792703, 16925.0999891 , 29722.34544422, 30034.47338215,
19734.25143047, 26913.19400285, 17549.35586496, 23479.78668562,
31282.98513386, 29410.21750629, 21294.89112012, 17237.22792703,
17237.22792703, 18173.61174082, 30346.60132008, 29410.21750629,
20358.50730633, 28785.96163043, 18797.86761668, 31595.11307179,
18797.86761668, 26288.93812699, 24416.17049941, 25040.42637527,
19422.12349254, 26601.06606492, 22231.2749339 , 29410.21750629,
25976.81018906, 29098.08956836, 25664.68225113, 24104.04256148,
16925.0999891 , 26601.06606492, 21294.89112012, 25352.5543132 ,
19734.25143047, 28785.96163043, 24416.17049941, 31282.98513386,
29410.21750629, 28785.96163043, 29410.21750629, 25352.5543132 ,
25352.5543132 , 20670.63524426, 21607.01905805, 29722.34544422,
19734.25143047, 20670.63524426, 31282.98513386, 21919.14699598,
27225.32194078, 22231.2749339 , 30658.729258 , 25664.68225113,
25352.5543132 , 19422.12349254, 26913.19400285, 31282.98513386,
27537.44987871, 22543.40287183, 28161.70575457, 26601.06606492,
27225.32194078, 31282.98513386, 17861.48380289, 22231.2749339 ,
17237.22792703, 30970.85719593, 23479.78668562, 24728.29843734,
25976.81018906, 30346.60132008, 24416.17049941, 17549.35586496,
16925.0999891 , 26913.19400285, 27537.44987871, 22231.2749339 ,
17861.48380289, 22855.53080976, 28161.70575457, 24416.17049941,
27225.32194078, 20982.76318219, 20046.3793684 , 20046.3793684 ,
26601.06606492, 25664.68225113, 23167.65874769, 25664.68225113,
31595.11307179, 19109.99555461, 19109.99555461, 30658.729258 ,
28161.70575457, 29722.34544422, 31282.98513386, 30970.85719593,
30346.60132008, 25976.81018906, 24104.04256148, 29722.34544422,
16925.0999891 , 20046.3793684 , 31907.24100972, 29410.21750629,
29722.34544422, 31282.98513386, 30970.85719593, 22855.53080976,
16925.0999891 , 20358.50730633, 24416.17049941, 18173.61174082,
22543.40287183, 22855.53080976, 21294.89112012, 27225.32194078,
23479.78668562, 30346.60132008, 19734.25143047, 28473.8336925 ,
31907.24100972, 29410.21750629, 30034.47338215, 24728.29843734,
28161.70575457, 17237.22792703, 31907.24100972, 27225.32194078,
21919.14699598, 24728.29843734, 20670.63524426, 24104.04256148,
16925.0999891 , 26601.06606492, 26913.19400285, 24728.29843734,
19734.25143047, 24728.29843734, 20358.50730633, 20982.76318219,
24416.17049941, 20982.76318219, 24104.04256148, 20982.76318219,
25976.81018906, 19734.25143047, 25040.42637527, 29410.21750629,
19734.25143047, 28473.8336925 , 27225.32194078, 18173.61174082,
20046.3793684 , 25040.42637527, 26601.06606492, 25664.68225113,
19422.12349254, 30658.729258 , 19734.25143047, 25352.5543132 ,
30034.47338215, 30346.60132008, 19422.12349254, 29410.21750629,
19734.25143047, 26913.19400285, 29098.08956836, 29722.34544422,
24728.29843734, 17861.48380289, 24416.17049941, 29722.34544422,
18797.86761668, 21294.89112012, 25352.5543132 , 24416.17049941,
21919.14699598, 25352.5543132 , 20046.3793684 , 22543.40287183,
18797.86761668, 24104.04256148, 24104.04256148, 26288.93812699,
24104.04256148, 27225.32194078, 19734.25143047, 22231.2749339 ,
23791.91462355, 24104.04256148, 17861.48380289, 25040.42637527,
22543.40287183, 25352.5543132 , 25040.42637527, 27225.32194078,
17549.35586496, 30970.85719593, 27225.32194078, 18173.61174082,
30970.85719593, 20670.63524426, 23791.91462355, 29722.34544422,
17861.48380289, 20670.63524426, 24728.29843734, 24416.17049941,
30970.85719593, 23479.78668562, 30970.85719593, 31907.24100972,
25040.42637527, 20046.3793684 , 23167.65874769, 27849.57781664,
24728.29843734, 17861.48380289, 26913.19400285, 31907.24100972,
19734.25143047, 26601.06606492, 22543.40287183, 23791.91462355,
20982.76318219, 29722.34544422, 24728.29843734, 27225.32194078,
18173.61174082, 30346.60132008, 31907.24100972, 31595.11307179,
28161.70575457, 21919.14699598, 31907.24100972, 29722.34544422,
18797.86761668, 27225.32194078, 25040.42637527, 26601.06606492,
25664.68225113, 25976.81018906, 27849.57781664, 22231.2749339 ,
17549.35586496, 26288.93812699, 31907.24100972, 24728.29843734,
19734.25143047, 18173.61174082, 28785.96163043, 29722.34544422,
30034.47338215, 26601.06606492, 20358.50730633, 24728.29843734,
25664.68225113, 26913.19400285, 20358.50730633, 25976.81018906,
28785.96163043, 21607.01905805, 24104.04256148, 23167.65874769,
29722.34544422, 29098.08956836, 30658.729258 , 21294.89112012,
28161.70575457, 26913.19400285, 28473.8336925 , 27537.44987871,
27225.32194078, 22855.53080976, 16925.0999891 , 28785.96163043,
23167.65874769, 26288.93812699, 30034.47338215, 25352.5543132 ,
18485.73967875, 29098.08956836, 23479.78668562, 31595.11307179,
26913.19400285, 28785.96163043, 19734.25143047, 26913.19400285,
31907.24100972, 27849.57781664, 19422.12349254, 19109.99555461,
19734.25143047, 17237.22792703, 28785.96163043, 21607.01905805,
17237.22792703, 18797.86761668, 18797.86761668, 22855.53080976,
24416.17049941, 17549.35586496, 30970.85719593, 17861.48380289,
31595.11307179, 20670.63524426, 21294.89112012, 27849.57781664,
31282.98513386, 25040.42637527, 25664.68225113, 19734.25143047,
24728.29843734, 18485.73967875, 17237.22792703, 24728.29843734,
16925.0999891 , 19109.99555461, 29722.34544422, 29722.34544422,
20670.63524426, 29722.34544422, 27849.57781664, 23791.91462355,
24104.04256148, 26601.06606492, 25976.81018906, 31595.11307179,
24416.17049941, 24416.17049941, 24104.04256148, 25352.5543132 ,
21919.14699598, 21294.89112012, 18797.86761668, 30970.85719593,
28785.96163043, 30034.47338215, 19734.25143047, 19734.25143047,
20670.63524426, 25976.81018906, 22543.40287183, 29410.21750629,
17549.35586496, 21294.89112012, 21607.01905805, 16925.0999891 ,
20358.50730633, 21919.14699598, 29098.08956836, 25040.42637527,
21607.01905805, 18797.86761668, 18173.61174082, 22543.40287183,
25040.42637527, 20358.50730633, 19734.25143047, 30658.729258 ,
21607.01905805, 26288.93812699, 19109.99555461, 22231.2749339 ,
30658.729258 , 21607.01905805, 25976.81018906, 26288.93812699,
24416.17049941, 31595.11307179, 31595.11307179, 26913.19400285,
23791.91462355, 31595.11307179, 16925.0999891 , 30658.729258 ,
22543.40287183, 21294.89112012, 27537.44987871, 19734.25143047,
31595.11307179, 20982.76318219, 27537.44987871, 19422.12349254,
23791.91462355, 19109.99555461, 17861.48380289, 17861.48380289,
20358.50730633, 26913.19400285, 25040.42637527, 28785.96163043,
26288.93812699, 28785.96163043, 26913.19400285, 30034.47338215,
16925.0999891 , 24104.04256148, 22231.2749339 , 18797.86761668,
18173.61174082, 17549.35586496, 26288.93812699, 16925.0999891 ,
30346.60132008, 22855.53080976, 28161.70575457, 20358.50730633,
17861.48380289, 28473.8336925 , 31595.11307179, 20046.3793684 ,
30346.60132008, 25040.42637527, 27849.57781664, 19109.99555461,
21607.01905805, 31907.24100972, 20046.3793684 , 30658.729258 ,
19109.99555461, 16925.0999891 , 23479.78668562, 20046.3793684 ,
24416.17049941, 22231.2749339 , 27225.32194078, 26288.93812699,
27225.32194078, 18485.73967875, 27849.57781664, 20982.76318219,
25040.42637527, 27225.32194078, 22855.53080976, 18173.61174082,
25664.68225113, 18173.61174082, 27537.44987871, 21294.89112012,
25352.5543132 , 24728.29843734, 29410.21750629, 24416.17049941,
30658.729258 , 25352.5543132 , 25976.81018906, 19734.25143047,
16925.0999891 , 28473.8336925 , 23167.65874769, 22855.53080976,
27225.32194078, 17237.22792703, 21919.14699598, 29098.08956836,
21919.14699598, 23479.78668562, 28473.8336925 , 20670.63524426,
30658.729258 , 19109.99555461, 17237.22792703, 27225.32194078,
20670.63524426, 23791.91462355, 17861.48380289, 28161.70575457,
30970.85719593, 31907.24100972, 24728.29843734, 25976.81018906,
20358.50730633, 27849.57781664, 21607.01905805, 17861.48380289,
30658.729258 , 24104.04256148, 18797.86761668, 20982.76318219,
17237.22792703, 18173.61174082, 24104.04256148, 30346.60132008,
24416.17049941, 28161.70575457, 28161.70575457, 23167.65874769,
17237.22792703, 26288.93812699, 20982.76318219, 24728.29843734,
23791.91462355, 22231.2749339 , 19109.99555461, 27849.57781664,
24728.29843734, 28473.8336925 , 25352.5543132 , 20358.50730633,
30034.47338215, 20670.63524426, 27537.44987871, 19109.99555461,
25040.42637527, 30658.729258 , 30970.85719593, 17861.48380289,
25664.68225113, 19734.25143047, 16925.0999891 , 31907.24100972,
18485.73967875, 29722.34544422, 21294.89112012, 26601.06606492,
19734.25143047, 22231.2749339 , 31907.24100972, 24416.17049941,
16925.0999891 , 25352.5543132 , 25664.68225113, 19422.12349254,
20982.76318219, 20046.3793684 , 25976.81018906, 25040.42637527,
17861.48380289, 25352.5543132 , 23791.91462355, 18797.86761668,
23791.91462355, 16925.0999891 , 31282.98513386, 28785.96163043,
25976.81018906, 17861.48380289])
rmse(targets, predictions)
4474.83985341589
model.coef_
array([312.12793793])
# b
model.intercept_
11306.797106367303
try_parameters(model.coef_, model.intercept_)
Based on our analysis, we know that Age and PremiumPrice have the strongest relationship. A second feature ("Weight") was added to the model to possibly reduce the loss. The formula is as follows:
PremiumPrice = w1 Age + w2 Weight + b
Analysis:
The RMSE with Weight included has a result of 4369.57. This is slightly less than the single feature model, which had a RMSE of 4474.83. As expected Weight does not significantly lower the costs due to a weak correlation of 0.14.
inputs = medical_df[['Age', 'Weight']]
targets = medical_df.PremiumPrice
print('inputs.shape :', inputs.shape)
print('targes.shape :', targets.shape)
inputs.shape : (986, 2) targes.shape : (986,)
model.fit(inputs, targets)
LinearRegression()
predictions = model.predict(inputs)
predictions
array([24006.41690067, 29790.56580777, 21321.06467302, 28636.94320625,
23910.73428592, 20117.42655092, 20042.40327006, 18600.37501815,
26097.28911393, 24249.15545688, 29858.25004196, 31264.94086664,
17424.73527661, 27027.199679 , 16830.25526223, 22557.04960209,
23743.01901057, 23504.62888077, 28038.11476547, 19191.86441226,
27967.43991101, 17795.50325453, 23241.23099068, 22632.07288295,
29221.09355368, 19618.62915129, 19505.27782293, 30805.82932065,
20388.16348769, 20313.14020682, 19437.59358874, 20352.82606046,
19422.91549539, 20149.77335788, 31856.43026075, 27062.53710623,
23557.63502161, 28351.52817613, 25004.01169992, 19725.99923912,
16915.60821004, 33227.78365819, 17602.78021889, 27892.41663014,
18051.56209795, 25114.37240802, 18550.35949758, 29466.82273016,
29893.58746919, 17930.87172291, 21065.0058296 , 31001.54297655,
29088.71570557, 25680.85404184, 28216.15970775, 23158.86866314,
29925.93427616, 25395.43901173, 18600.37501815, 25335.09382421,
22065.59124913, 27411.28794413, 18668.05925234, 21039.99806932,
30367.37710853, 19540.61525016, 27967.43991101, 16060.72092583,
22168.61291055, 19932.04256196, 18094.23857185, 29644.86767245,
30392.38486882, 18482.67526338, 26310.67148345, 31806.41474017,
18938.79618911, 22336.3281859 , 25057.01784077, 21335.74276637,
24166.79312934, 29444.80559014, 18500.343977 , 25538.14652678,
24252.14607714, 21243.05077189, 26824.14697642, 18472.34559644,
21549.12513589, 20313.14020682, 24590.5672481 , 32219.85919199,
27486.31122499, 27436.29570441, 27867.40886985, 19056.49594388,
21709.50136456, 21175.3665377 , 26606.4161805 , 25773.54603632,
20064.42041008, 26307.68086318, 30763.15284674, 16416.81081041,
19751.00699941, 20989.98254874, 28775.3022949 , 22258.31428476,
19896.70513473, 29690.53476662, 29441.81496987, 20939.96702816,
20142.43431121, 21210.70396493, 25919.24417164, 18618.04373177,
29103.39379891, 28732.625821 , 30452.73005634, 20067.41103035,
27300.92723603, 30663.12180559, 22827.78653885, 19344.90159426,
22072.9302958 , 22190.63005057, 24665.59052897, 22058.25220246,
29544.83663129, 16196.08939422, 18407.65198252, 30153.99473902,
24384.52392526, 19751.00699941, 22860.13334581, 19989.39712921,
23461.95240686, 31806.41474017, 27849.74015624, 25352.76253782,
25011.3507466 , 26962.50606507, 20854.61408036, 20922.29831455,
30018.62627064, 27418.6269908 , 21413.7566675 , 29138.73122614,
28002.77733824, 25302.74701725, 28469.2279309 , 28626.61353931,
28647.27287319, 25299.75639698, 20613.23333028, 32921.7092942 ,
30146.65569234, 17457.08208357, 27711.38106759, 25132.04112163,
32074.16105667, 17279.03714128, 23333.92298516, 27596.67193309,
20277.80277959, 23113.20156897, 16897.93949642, 20844.28441341,
24775.95123706, 24622.91405506, 27315.60532938, 20861.95312703,
21107.68230351, 26393.03381098, 18507.68302367, 31133.92082466,
24013.75594734, 24188.81026936, 21396.08795389, 32009.46744275,
22557.04960209, 24341.84745135, 21268.05853218, 25382.11872452,
22689.4274502 , 26621.09427385, 20730.93308505, 21257.72886524,
25502.80909955, 23782.70486421, 21606.47970314, 28466.23731063,
30311.38034742, 23629.66768221, 25207.0644025 , 20933.98578763,
26390.04319072, 26239.99662899, 27857.07920291, 18133.92442548,
24740.61380983, 21624.14841675, 21944.90087409, 22230.31590421,
22629.08226268, 32091.82977028, 23476.63050021, 20011.41426923,
22368.67499286, 28847.3349555 , 25324.76415727, 20548.53971636,
31482.67166256, 27974.77895768, 28298.52203529, 19099.17241778,
33701.57329754, 34944.89727327, 33666.23587031, 24053.44180097,
28619.27449263, 33149.76975706, 28508.91378454, 20107.09688398,
23935.74204621, 32572.9584563 , 31347.30319418, 28227.84718083,
21197.38367772, 29484.49144378, 29715.54252691, 22336.3281859 ,
27596.67193309, 19127.17079834, 20135.09526454, 16491.83409127,
23308.91522487, 21980.23830132, 31500.34037618, 29263.77002759,
27832.07144262, 18311.96936777, 16238.76586812, 27358.28180328,
26596.08651356, 27208.23524155, 19939.38160864, 27283.25852242,
25107.03336135, 17888.19524901, 19971.7284156 , 30680.7905192 ,
22368.67499286, 29221.09355368, 30873.51355484, 22852.79429914,
17389.39784938, 16730.22422108, 30407.06296217, 22208.29876419,
26841.81569004, 18008.88562404, 27165.55876765, 23757.69710392,
18846.10419463, 19252.20959978, 27749.70911508, 21606.47970314,
24199.1399363 , 26139.96558783, 24387.51454553, 31026.55073683,
24099.10889515, 21997.90701494, 32059.48296332, 26774.13145585,
16374.1343365 , 19084.49432443, 25741.19922936, 27689.36392757,
16627.20255966, 21752.17783846, 31585.69332398, 31671.04627179,
27197.90557461, 26977.18415842, 30926.51969568, 26200.31077535,
25420.44677201, 30908.85098207, 26343.01829041, 20277.80277959,
24259.48512382, 26104.6281606 , 24469.87687306, 32262.5356659 ,
23386.929126 , 21759.51688513, 28298.52203529, 22902.80981972,
19202.1940792 , 18685.72796596, 18956.46490272, 24537.56110726,
18660.72020567, 25004.01169992, 30025.96531731, 22760.10230466,
31899.10673465, 32465.58836847, 17991.21691043, 20377.83382075,
23148.5389962 , 25011.3507466 , 29815.57356806, 20811.93760645,
26944.83735146, 27935.09310404, 29035.70956472, 23472.28207381,
20032.07360312, 22952.82534029, 23266.23875096, 18643.05149206,
30171.66345263, 21360.75052666, 26232.65758231, 22624.73383628,
27782.05592205, 30916.19002874, 20185.11078511, 30474.74719636,
31204.59567912, 23785.69548448, 21649.15617704, 29712.55190664,
16737.56326775, 27536.32674557, 24277.15383743, 25039.34912715,
26325.34957679, 23650.32701609, 27586.34226614, 18211.93832662,
21235.71172522, 25648.50723488, 16865.59268946, 22115.60676971,
30053.96369787, 27105.21358013, 27450.97379776, 20786.92984616,
25029.01946021, 18565.03759092, 25096.7036944 , 30951.52745597,
26189.98110841, 22820.44749218, 25495.47005288, 25666.17594849,
25164.3879286 , 22318.65947228, 23935.74204621, 24462.53782639,
26485.72580546, 28586.92768567, 26435.71028489, 26741.78464888,
22293.65171199, 21253.38043883, 30688.12956588, 31111.90368464,
20310.14958656, 29740.5502872 , 20548.53971636, 23191.2154701 ,
24665.59052897, 30713.13732617, 27774.71687537, 23903.39523924,
29537.49758462, 23379.59007933, 18565.03759092, 22895.47077304,
24006.41690067, 29128.4015592 , 20007.06584283, 30823.49803426,
26962.50606507, 26036.94392641, 29893.58746919, 23793.03453115,
21278.38819912, 29096.05475224, 29053.37827834, 18746.07315348,
20989.98254874, 20202.77949873, 31467.99356921, 23447.27431352,
19999.72679615, 29936.2639431 , 21709.50136456, 28426.551457 ,
31179.58791883, 23354.58231904, 21007.65126235, 21609.4703234 ,
25360.1015845 , 21922.88373407, 29477.1523971 , 22404.01242009,
29900.92651587, 21328.4037197 , 23422.26655323, 20701.57689836,
15679.62328097, 24886.31194516, 20220.44821234, 22785.11006495,
19127.17079834, 22200.95971751, 16797.90845527, 16915.60821004,
28953.34723719, 25530.80748011, 18906.44938215, 27988.09924489,
17111.32186594, 17432.07432328, 28419.21241033, 22243.63619142,
18710.73572625, 24302.16159772, 26656.43170108, 21905.21502046,
19871.69737444, 21160.68844435, 17424.73527661, 30342.36934825,
25082.02560106, 30435.06134273, 17271.69809461, 21787.51526569,
16940.61597032, 15950.36021774, 27988.09924489, 29858.25004196,
19515.60748987, 26859.48440365, 17524.76631776, 23208.88418371,
31044.21945045, 27674.68583422, 20676.56913807, 15586.93128649,
16737.56326775, 17948.54043652, 29968.61075006, 28419.21241033,
18924.11809576, 27521.64865222, 17763.15644757, 31696.05403208,
18575.36725786, 25826.55217716, 23133.86090285, 24979.00393964,
18931.45714244, 26343.01829041, 21075.33549655, 28486.89664452,
25986.92840584, 27699.69359451, 24116.77760876, 23632.65830248,
16085.72868612, 26478.38675879, 20473.5164355 , 24074.10113486,
18771.08091376, 28266.17522833, 23404.59783962, 30164.32440596,
27674.68583422, 28198.49099414, 28486.89664452, 25427.78581869,
23803.36419809, 19102.16303805, 20042.40327006, 29138.73122614,
18838.76514796, 18966.79456967, 30299.69287434, 20897.29055426,
26428.37123821, 21278.38819912, 30011.28722396, 24590.5672481 ,
25630.83852126, 19134.50984501, 26791.80016946, 32330.21990009,
28163.15356691, 23216.22323039, 28451.55921729, 26884.49216394,
27646.68745366, 31721.06179237, 18921.12747549, 22022.91477523,
18158.93218577, 31678.38531846, 23547.30535467, 24733.27476316,
27340.61308967, 30103.97921844, 25502.80909955, 18269.29289387,
17642.46607253, 28010.11638491, 27960.10086433, 22835.12558553,
18515.02207035, 22785.11006495, 28993.03309082, 24893.65099183,
28391.21402977, 21513.78770866, 20370.49477407, 20979.6528818 ,
26884.49216394, 26350.35733708, 23436.94464658, 26824.14697642,
31763.73826627, 19633.30724464, 19227.20183949, 32041.81424971,
28519.24345148, 29612.52086549, 31924.11449494, 31001.54297655,
30239.34768682, 26122.29687422, 24106.44794182, 30018.62627064,
16694.88679385, 20099.75783731, 32483.25708209, 30246.6867335 ,
30560.10014417, 31179.58791883, 31001.54297655, 23732.68934363,
18116.25571187, 20074.75007702, 25705.86180213, 19166.85665197,
22877.80205943, 24003.4262804 , 22165.62229028, 28391.21402977,
24427.20039916, 31660.71660485, 19854.02866083, 29238.7622673 ,
32889.36248724, 29231.42322062, 29858.25004196, 25545.48557346,
28451.55921729, 17075.98443871, 31671.04627179, 27646.68745366,
21912.55406713, 24868.64323154, 21741.84817152, 24377.18487858,
17033.3079648 , 26546.07099298, 26994.85287204, 25951.59097861,
20598.55523694, 24800.95899735, 21563.80322923, 20769.26113255,
25299.75639698, 21310.73500608, 24918.65875212, 21446.10347447,
27069.8761529 , 21072.34487628, 25791.21474993, 29569.84439158,
20463.18676855, 29441.81496987, 27037.52934594, 18422.33007587,
20438.17900827, 25723.53051574, 26749.12369556, 26485.72580546,
20352.82606046, 32041.81424971, 20124.7655976 , 25224.73311611,
31211.93472579, 31051.55849712, 20758.93146561, 29299.10745482,
20124.7655976 , 27400.95827718, 30204.01025959, 30086.31050483,
25680.85404184, 17973.54819681, 24352.1771183 , 29680.20509968,
19793.68347331, 21759.51688513, 25833.89122384, 24758.28252345,
22724.76487743, 26510.73356575, 20641.23171084, 22877.80205943,
19455.26230235, 24918.65875212, 22955.81596056, 25149.70983525,
25460.13262565, 26360.68700402, 19515.60748987, 22022.91477523,
22710.08678408, 23429.6055999 , 17905.86396262, 24572.89853449,
23554.64440134, 26104.6281606 , 24234.47736353, 26563.7397066 ,
16171.08163393, 30933.85874236, 27308.2662827 , 18422.33007587,
30189.33216625, 19982.05808254, 25417.45615175, 28597.25735261,
18041.232431 , 21471.11123475, 24665.59052897, 23607.65054219,
30866.17450816, 21922.88373407, 32490.59612876, 31400.30933502,
23693.00349 , 19558.28396378, 23098.52347562, 27055.19805955,
23921.06395286, 17702.81126005, 25370.43125144, 31874.09897436,
18297.29127442, 26410.7025246 , 22674.74935685, 24875.98227821,
21716.84041123, 29274.09969453, 26086.95944699, 28865.00366911,
19776.0147597 , 31931.45354161, 33498.52059496, 31899.10673465,
27707.03264118, 20761.92208588, 32280.20437951, 28732.625821 ,
20132.10464427, 26360.68700402, 25655.84628155, 26546.07099298,
25470.46229259, 25580.82300069, 27528.98769889, 20736.91432559,
18946.13523578, 25082.02560106, 33566.20482915, 26019.2752128 ,
18838.76514796, 18828.43548101, 29619.85991216, 28597.25735261,
30805.82932065, 25463.12324592, 20751.59241893, 23311.90584514,
25741.19922936, 28348.53755587, 19194.85503253, 24633.243722 ,
30364.38648827, 22275.98299838, 24038.76370763, 24316.83969107,
28732.625821 , 30339.37872798, 31703.39307875, 19932.04256196,
27097.87453346, 27739.37944814, 28223.49875442, 26741.78464888,
26969.84511175, 22311.32042561, 16491.83409127, 30229.01801988,
24384.52392526, 24878.97289848, 30738.14508645, 24074.10113486,
17246.69033432, 28444.22017061, 24494.88463335, 32305.2121398 ,
28483.90602425, 29552.17567797, 20057.0813634 , 28348.53755587,
31332.62510083, 28273.514275 , 19134.50984501, 19565.62301045,
20124.7655976 , 16872.93173613, 28469.2279309 , 22072.9302958 ,
17143.6686729 , 18981.47266301, 19184.52536559, 22785.11006495,
26044.28297309, 17998.5559571 , 32152.1749578 , 18041.232431 ,
32169.84367142, 21335.74276637, 21082.67454322, 27799.72463566,
30028.95593758, 23896.05619257, 24387.51454553, 19786.34442664,
23785.69548448, 19141.84889168, 15722.29975488, 25274.74863669,
18319.30841444, 18888.78066853, 30898.52131513, 28597.25735261,
19846.68961416, 28258.83618165, 27461.3034647 , 23386.929126 ,
22820.44749218, 25598.4917143 , 27340.61308967, 32711.31754495,
25164.3879286 , 24622.91405506, 24241.8164102 , 23803.36419809,
20491.18514911, 20270.46373292, 19793.68347331, 31543.01685008,
29552.17567797, 28910.67076328, 19109.50208472, 20327.81830017,
20929.63736122, 27272.92885547, 21456.43314141, 29908.26556254,
16306.45010231, 21894.88535352, 21937.56182742, 16762.57102804,
21699.17169762, 23063.18604839, 29730.22062025, 23693.00349 ,
20583.87714359, 18846.10419463, 16797.90845527, 21253.38043883,
25588.16204736, 20413.17124798, 19312.5547873 , 30755.81380007,
20380.82444102, 27518.65803195, 20310.14958656, 23376.59945906,
32244.86695228, 20854.61408036, 25716.19146907, 26097.28911393,
23066.17666866, 32779.00177914, 33117.4229501 , 26047.27359335,
23116.19218923, 31831.42250046, 17236.36066738, 29808.23452139,
23757.69710392, 22707.09616382, 27080.20581984, 20327.81830017,
32034.47520303, 21919.89311381, 27892.41663014, 18728.40443986,
24199.1399363 , 20107.09688398, 19327.23288064, 18582.70630454,
19330.22350091, 26724.11593527, 23693.00349 , 27589.33288641,
27112.5526268 , 27995.43829156, 26588.74746689, 29113.72346586,
15747.30751517, 22820.44749218, 22090.59900942, 20335.15734684,
18760.75124682, 16780.23974165, 26097.28911393, 16559.51832546,
29630.1895791 , 22311.32042561, 27503.97993861, 20142.43431121,
17702.81126005, 28223.49875442, 31357.63286112, 19490.59972958,
29630.1895791 , 24911.31970544, 27732.04040147, 18888.78066853,
21328.4037197 , 31197.25663245, 19490.59972958, 30011.28722396,
18347.306795 , 16288.7813887 , 22870.46301276, 19422.91549539,
23878.38747896, 21684.49360427, 26902.16087756, 25623.49947459,
26496.05547241, 17923.53267624, 27258.25076213, 20363.1557274 ,
24302.16159772, 26563.7397066 , 22175.95195723, 17813.17196814,
25538.14652678, 17813.17196814, 26809.46888308, 20676.56913807,
24750.94347677, 24259.48512382, 29299.10745482, 23743.01901057,
30011.28722396, 25021.68041354, 25445.4545323 , 19312.5547873 ,
16762.57102804, 28088.13028604, 22692.41807047, 22446.68889399,
26699.10817498, 16872.93173613, 21641.81713037, 28850.32557576,
21777.18559875, 23276.56841791, 28020.44605185, 20117.42655092,
30282.02416073, 18888.78066853, 16805.24750194, 26902.16087756,
20252.79501931, 23319.24489181, 17296.7058549 , 27910.08534376,
30257.01640044, 31197.25663245, 24597.90629477, 25580.82300069,
19668.64467187, 27732.04040147, 21125.35101712, 17567.44279166,
30078.97145815, 23429.6055999 , 18101.57761852, 20701.57689836,
16534.51056518, 17813.17196814, 23971.07947344, 29833.24228168,
24284.49288411, 27639.34840699, 27707.03264118, 22760.10230466,
16669.87903356, 25691.18370878, 20227.78725902, 24394.8535922 ,
23048.50795504, 21616.80937008, 18618.04373177, 27122.88229375,
24462.53782639, 28223.49875442, 25089.36464773, 20007.06584283,
29519.82887101, 20252.79501931, 26877.15311727, 18550.35949758,
24302.16159772, 29943.60298977, 30730.80603978, 17093.65315232,
24996.67265325, 19583.29172406, 16424.14985708, 31806.41474017,
18194.269613 , 29003.36275776, 20811.93760645, 25936.91288526,
19177.18631891, 21549.12513589, 31535.67780341, 23743.01901057,
16627.20255966, 24615.57500839, 25538.14652678, 18660.72020567,
20836.94536674, 19287.54702701, 25242.40182973, 24911.31970544,
17635.12702586, 24683.25924258, 23319.24489181, 18372.31455529,
23589.98182857, 16221.09715451, 30841.16674788, 28401.54369671,
25716.19146907, 17702.81126005])
rmse(targets, predictions)
4369.5796297419165
Improvements to the model can be made by adjusting the weight of numerical features in the dataset. This can be achieved by a process known as feature scaling through standardization. It involves rescaling each feature such that it has a standard deviation of 1 and mean of 0 for numeric features in the dataset.
Analysis:
With the addition of feature scaling, the RMSE improved to 3731.82. This brings the results closer to 10 - 15 percent of the data range as initially noted. The values of the standardized features are also displayed.
numeric_cols = ['Age', 'Height', 'Weight', 'NumberOfMajorSurgeries']
scaler = StandardScaler()
scaler.fit(medical_df[numeric_cols])
StandardScaler()
scaled_inputs = scaler.transform(medical_df[numeric_cols])
scaled_inputs
array([[ 0.23319694, -1.30610453, -1.39924954, -0.89118667],
[ 1.30798124, 1.17085167, -0.27706151, -0.89118667],
[-0.41167363, -1.00886978, -1.25897603, 0.44423895],
...,
[ 1.02137209, -1.30610453, -0.41733501, 0.44423895],
[ 0.37650152, -1.00886978, -0.27706151, 0.44423895],
[-1.48645793, -1.00886978, -0.13678801, 0.44423895]])
cat_cols = ['Diabetes', 'BloodPressureProblems', 'AnyTransplants', 'AnyChronicDiseases', 'KnownAllergies', 'HistoryOfCancerInFamily']
categorical_data = medical_df[cat_cols].values
inputs = np.concatenate((scaled_inputs, categorical_data), axis=1)
targets = medical_df.PremiumPrice
# Create and train the model
model = LinearRegression().fit(inputs, targets)
# Generate predictions
predictions = model.predict(inputs)
# Compute loss to evalute the model
loss = rmse(targets, predictions)
print('Loss:', loss)
Loss: 3731.8234288333797
weights_df = pd.DataFrame({
'feature': np.append(numeric_cols + cat_cols, 1),
'weight': np.append(model.coef_, model.intercept_)
})
weights_df.sort_values('weight', ascending=False)
| feature | weight | |
|---|---|---|
| 10 | 1 | 23176.017071 |
| 6 | AnyTransplants | 7894.201264 |
| 0 | Age | 4596.742766 |
| 7 | AnyChronicDiseases | 2654.886425 |
| 9 | HistoryOfCancerInFamily | 2311.829368 |
| 2 | Weight | 993.421080 |
| 8 | KnownAllergies | 300.882400 |
| 5 | BloodPressureProblems | 180.503577 |
| 1 | Height | -58.760410 |
| 4 | Diabetes | -429.119839 |
| 3 | NumberOfMajorSurgeries | -489.870967 |
Models like the one we created are designed for real-world applications. It's common practice to set aside a small portion of the data, typically around 10%, for testing and evaluating the model's performance.
inputs_train, inputs_test, targets_train, targets_test = train_test_split(inputs, targets, test_size=0.1)
# Create and train the model
model = LinearRegression().fit(inputs_train, targets_train)
# Generate predictions
predictions_test = model.predict(inputs_test)
# Compute loss to evalute the model
loss = rmse(targets_test, predictions_test)
print('Test Loss:', loss)
Test Loss: 4435.015156954542
# Generate predictions
predictions_train = model.predict(inputs_train)
# Compute loss to evalute the model
loss = rmse(targets_train, predictions_train)
print('Training Loss:', loss)
Training Loss: 3646.0431646982643